AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Long video understanding

# Long video understanding

Eagle2.5 8B
Other
Eagle 2.5 is a cutting-edge vision-language model (VLM) designed for long-context multimodal learning, supporting the processing of video sequences up to 512 frames and high-resolution images.
Text-to-Image Transformers Other
E
nvidia
2,626
8
Llavaction 0.5B
LLaVAction is a multimodal large language model for action recognition, based on the Qwen2 language model, trained on the EPIC-KITCHENS-100-MQA dataset.
Video-to-Text Transformers English
L
MLAdaptiveIntelligence
215
1
Timesformer Base Finetuned K600
TimeSformer is a video classification model based on spatio-temporal attention mechanisms, fine-tuned on the Kinetics-600 dataset.
Video Processing Transformers
T
fcakyon
20
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase